DOCUMENT RESUME 

ED 389 229 FL 023 449 



AUTHOR 
TITLE 



INSTITUTION 

PUB DATE 
NOTE 

PUB TYPE 



Orr, Thomas; And Others 

An Analysis of Lexical Frequency and Discourse Need 
for Instructive Purposes. Technical Report 
95-5-002. 

Aizu Univ., Aizuwakamat su (Japan). Center for 
Language Research. 
17 Apr 95 
16p. 

Reports - Research/Technical (143) 



EDRS PRICE MF01/PC01 Plus Postage. 

DESCRIPTORS Computer Science; Computer Software; Discourse 

Analysis; "Educational Needs; English (Second 
Language); "English for Special Purposes; Foreign 
Countries; *Freshman Composition; Higher Education; 
Second Language Instruction; Technical Writing; 
'"Vocabulary; '"Word Frequency; Writing Instruction 



ABSTRACT 

The report details a study to identify vocabulary 
needed by Japanese students of English as a Second Language to 
function successfully in college computer science courses and 
research laboratory apprenticeships. The vocabulary was then to be 
taught in the first two semesters of freshman English composition. 
The study involved development of simple computer software and 
administration of an electronic questionnaire and individual 
interviews. The software was designed to study lexical frequency in 
the published annual review of the university's school of computer 
science and engineering. Software specifications are given here. The 
resulting vocabulary is also presented, in four categories: 
university academic vocabulary; university physical environment 
vocabulary; people-related vocabulary; and computer science discourse 
vocabulary. It was found that most of the words students need to know 
to negotiate computer science content materials in English are words 
common to general English rather than words of a semi - of highly 
technical nature. Contains 17 references. (MSE) 



it !'f it it it it it it it >'c it it >'; )'; it it it >'c >'c !',- it it it i'c it it it it i; it it it it it it it it it it it it it it it it it >': it it it it it it it it it it it it it it >'c it it it it it it i 

* Reproductions supplied by EDRS are the best that can bo made 

* from the original document. 

>'c it it it it it it it it it it it it it it it it it it it it it i: it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it i 



o 

ERIC 



Tec hnical Report 95-5-002 



a 

LU 



An Analysis of Lexical Frequency 
and Discourse Need 
for Instructive Purposes 



Thomas On*. Kiel Christ ianson. 
Christian Coot -/p. Hideaki Okawara 



April IT. 1!M)5 



c THE ^ '* 

Z UNIVERSITY t 
1 OF — ==- * 
o AIZU : '* 

- / v 





BEST COPY AVAILABLE 



Center for Language Research 
University of Aizu 



Vi/.uw a kaii ml su . Fiikushiiiia 
5H55-80 Japan 







1 !'«! H'll. Ali, .rj^i Hj lit l 

I ' "<MA!H.t ( . mil H || l(U , 



2 




I lie lerimiral report arc puhlished fur ear!\ disM'ininal ion ul research re-Mills l>\ 1 1 k 
mci itI k'1> of I he I nivcrsil \ uf Aizu. The completed refill 1 s niiiy he si i hi nil f eel ialer l < 
journals and conferences lor pnhlical 1011. 



3 



o 

ERIC 



Technical Report 95-5-002 



Title: 



An analysis of lexical frequency and discourse need for instructive purposes 



Authors: 

Thomas Orr, Kiel Christ ianson. Christian Goetze, Hideaki Okawara 



Key Words and Phrases: 

English for Specific Purposes (ESP), English for Science and Technology (EST), lexical 
frequency, academic vocabulary, computer science vocabulary, needs analysis 



Abstract: 

The research activities detailed in this technical report consist of simple research software 
development and application along with electronic questionnaires and face-to-face interviews 
for the purpose of identifying a corpus of English words to be taught in freshman Composition 
1 and 2. It is believed that mastery of these words is necessary for University of Aizu students 
to be able to function successfully in their computer courses and research lab apprenticeships. 



Report Date: 
April 17, 1995 



Written Language: 
English 



Any Other Identifying Information of this Report: 



Distribution Statement: 
First Issue: 100 copies 



Supplementary Notes: 

This research was conducted by the Composition Courseware Research Team as part of its 
responsibilities to develop effective composition instruction for University of Aizu freshmen. 



Center for Language Research 
University of Aizu 

Aizu-Wakamatsu City 
Fukushima 9G5-80 Japan 



o 

ERIC 



An Analysis of Lexical Frequency 
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Christian Goctze, Hideaki Okawara 



1 Introduction 

The defining characteristic of an ESP 1 approach to English language instruction is its thorough 
and continual needs analysis which identifies target language needs and facilitates the formation 
of instructional priorities. The Center for Language Research (CLR) at the University of Aizu 
exemplifies this trend in language education as it studies the English language needs of students 
and professionals in computer science and develops instruction material and services to enable 
nonnative speakers of English (NNSs) to function more successfully within the international 
computer science discipline. 

One of the CLR teams' active in needs analysis is the English Composition Courseware 
Team, 2 which is responsible for developing the syllabus for the first two semesters of student 
writing instruction. The team's research activity detailed in this technical report consists of 
simple research software development and application along with electronic questionnaires and 
face-to-face interviews for the purpose of identifying a prospective English corpus for instructive 
purposes. These words would be taught in Composition 1 and 2 to assist University of Aizu 
students function successfully in their computer courses and research lab apprenticeships. 

Though knowledge of the English vocabulary listed in this report will be essential for fresh- 
man students at this university, the authors recognize that the material covered here may also 
serve broader needs. First, the English vocabulary will be extremely useful to high school stu- 
dents preparing for coursework at the University of Aizu; and second, the discussion of research 
methodology will be useful to ESP educators who wish to initiate vocabulary research at their 
own locale. It is, thus, with a desire to provide genuinely useful material to both students and 
educators that we oiler this technical report to our readers. 

1 English for Specific Purposes 

2 Tlie members of the English Composition Courseware Team are Associate Professor Thomas Orr (Center for 
Language Research), Assistant Professor Kiel Christiansou (Center for Language Research), Assistant Professor 
Christian CJoet/.e (Department of Computer Software), and special research assistant Hideaki Okiwara (University 
of Aizu student ). 
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2 Research Method 
2.1 Background 



Lexical needs analysis is not new to ESP: however, it is unfortunately far too rare in most lan- 
guage programs. In contrast to English for General Purposes, English for Specific Purposes and 
its major subdomains (English for Medical Purposes, English for Business Purposes, English for 
Academic Purposes, English for Vocational Purposes, and English for Science and Technology) 
seek to orient language learners to a specific subculture characterized by specific language used 
to accomplish specific tasks. As such, the identification of an English corpus used to convey a 
field's content and used to accomplish its goals is both reasonable and necessary if a language 
program is to be effective. 

Some of the early work in corpus construction and analysis begun in the 1960s (Kncera & 
Francis, 1967) has been carried further by ESP specialists in several academic fields. Salager- 
Meyer (1983) and Chandler-Burns (1986, 1987, 1994a, 1994b, 1995), for example, have been 
actively applying computational linguistics in the field of medicine; Inman (1978) conducted a 
well-known lexical analysis of scientific and technical prose; and Bramki &: Williams (1984) have 
studied the vocabulary of economics. In addition to studies of discipline-specific lexicons, some 
language programs have carried out their own vocabulary studies of language within the local 
university environment in order to prioritize their own English language instruction. Flowerdew's 
1993 study at the Sultan Qaboos University is one good example. These kinds of studies have 
been important contributions to ESP and must continue if programs are to effectively prioritize 
their instruction so that students can learn to perform the most needful language tasks within 
the time alloted to their language studies. 

2.2 Knowledge Sought 

The goal of this research project was to identify the English words most needed by our students 
for them to function well in the academic setting here at the University of Aizu. These words 
would be organized under four categories: 1) the vocabulary of academia (to understand class- 
room or research laboratory instruction); 2) the vocabulary of campus locations and objects (to 
understand references to the physical university environment); 3) the vocabulary of university 
people (to understand references to local groups on campus), and 4) the vocabulary of intro- 
ductory computer science discourse (to understand written and spoken' communication about 
computer science). These four lists, however, required four different research tasks. 

The first list (Academic Vocabulary) required a survey of University faculty to identify 
which words were felt to be most critical for students to understand instruction in lectures and 
joint research projects with the faculty. These items were gathered via an electronic survey on 
the campus network and through discussions over a period of two years with faculty who had 
experience teaching our students. 

The second list (University of Aizu Physical Environment Vocabulary) required the consul- 
tation of campus maps and floor plans, university email announcements, and a simple analysis 
of student writing during the first two years of freshman Composition 1 and 2 to learn what 
objects in the physical environment were most often referred to and which items gave local NNSs 
the most difficulty. 

The third list (University of Aizu People Vocabulary) required the consultation of the Vnt- 
vrrsity of Aizu Rttjttlatwns and the application of native-speaker knowledge of standard English 



usage in American universities. 

The fourth list (University of Ami Computer Science Discourse Vocabulary] required the 
most thought and technical assistance. The first problem was to identify what language records 
would best represent the vocabulary necessary to understand the verbal and written discourse 
of the local computer science culture, and the second problem was how to extract the appro- 
priate lexicon most efficiently. To solve these two problems, the research team decided that the 
development of software to perform a simple computational analysis of lexical frequency on the 
University of Aizu 1993 Annual Review: School of Computer Science, and Engineering would be 
the wisest choice for the following reasons: 1) the Annual Review contained the best English 
textual record of descriptions of the university, the departments, the centers, and ihe research 
labs; 2) it contained abstracts of all faculty research work during the 1993 academic year; and 
3) the document was on the campus network, which made electronic investigation very easy. 

Though the team had originally considered whether or not to include course textbooks, 
campus email discussions, seminar announcements k abstracts, and university technical reports, 
it was decided that the Annual Review would be the one document that gave the most complete 
and most accurate record of local computer science discourse that would include both profiles 
of university programs and discussions of specific university research activities. Once the target 
text was agreed upon, the next step was to design a software tool for the network that could 
easily abstract the needed data. 

2.3 Software Sought 

The software tool for a lexical frequency study needed to be able to do the following at the touch 
of one button: 

• give a total count of words within the text 

• list all words in order < ! frequency followed by the number of times the word appeared in 
the text, e.g., word (number) 

• list all words in alphabetical order followed by the number of times the word appeared in 
the text, e.g., A-word (number), B-word (number) 

• record this data at the end of the text so it could be printed along with the text if desired 

Professor Goetze. the team's technical assistant from the Department of Computer Software 
responded to this request with the following software design, 

2.3.1 Software Specifications 

Even though the task of counting words and sentences may appear relatively easy, design software 
that can accomplish this in a sufficiently precise manner can be amazingly difficult. The main 
problem stems from distinguishing between abbreviations and words at the end of sentences. 
Minor problems are encountered when one processes "semi-formatted" text with itemized or 
numbered lists embedded in the text. 

In this project, the following definitions were used: 



1. A whttt Hpim is either a space, a line break or the beginning or the end of the text. 



2 A word is a sequence of alphabetical characters that may contain the special characters 
"-" and and that must contain at least one vowel. 



3. A sentence is a sequence of wonls terminated by "\" or There must be a 
ward immediately preceding those characters, and those characters must be followed by a 
white space. 

4. Any sequence of non-white spaa characters following any one of "/", "\", "#'". "<-i" 
are ignored. 

Definition 2 is a heuristic to distinguish between abbreviations and "real" words. This 
definition has a fairly good hit-ratio, and is very "cheap" to implement. Using dictionaries 
would work better, but is more "expensive" to implement. 

Definition 3 is also mainly geared at distinguishing between abbreviations and words. One 
wants to ignore embedded dots as those are often used in e-mail addresses and other words in 
computer-related literature. 

Definition 4 filters out most formatting commands used by different text processing systems 
(LaTeX, TeX, nroff). 

The implementation was done in (T, and comprises about 200 lines of code 1 . The program 
works as a filter, taking the text to be examined as input and producing a statistic as output. 
Since a hash-table is used for the word-frequency test, the run-time performance is basically 
proportional to the size of the text, where text with rich vocabulary might take slightly longer 
than the same size text with a simple repertoire. 

The program can be used as is or via text editors like "vi" or "cniacs". The program and 
editor-scripts can be downloaded via: 

ftp://ftp.u-aizu.ac.jp/u-aizu/ch7count-words.tar.gz 

The following shows a sample of the data from the Annual Review after the software was 
employed: 

vocabulary = 7683 
words = 51983 

the(2724), of (2481), and(1778), a(1203), in(1083), to(771), is(715), for(641), 
on(525), with(328), are(323), we(264), by(259), this(256), as(253), an(238) , 
computer ( 237 ) , that(225), research(185) , be(181), systems ( 181 ) , system(17l), 
laboratory(164) , sof tware(156) , university (150) , design(147), which(146), 
model(143), can(135), prof essor(125) , parallel (124) , new(123) , method(12l), 
has(120), based(119), processing(113) , from(lll) , achievement (1 10) , have(HO), 
it(109), summary(109) , also(103) , using(lOl) , data(97) , international(97) , 
algorithm (96) .... 

a(1203), ability(5), able(5), abnormal(l), about(18), above(7), abroad(l), 
abrupt(l), absence(l), absorption(6) , abstract(15) , abstracted(l) , 
abstracting(l) , abstraction(3) , abstractly(l) .... 
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3 Results 



The following sections list the corpora that resulted from this study and are thought to be of 
value to University of Aizu students preparing for English instruction and computer science 
discourse during their freshman year in Composition 1 and 2. The lists do not contain ALL the 
words that students need to learn during their four-year course of study, but only those words 
that would be appropriate for mastery during their freshman year of studies. 

3.1 University of Aizu Academic Vocabulary 

absence(s); (to be) absent (from) 
abstract (of an article) 

analysis (analyses); to analyze; analytic, analytical: analytically 

answer(s); to answer (a question) 

(to make) an appointment (appointments) 

(to give) an assignment (assignments): to assign 

(to take) attendance; to attend (class, a conference, a meeting, a seminar) 
to call on (someone) 

dass(es) (= a group of students or a course) 
(academic) conference(s) 

(to make) a correction (corrections): to correct: correct (answers); (to do something) correctly 

course(s) 

coursework 

(to set, to meet) a deadline (deadlines) 
(to make) a deletion (deletions); to delete 

(to give) direction(s); to direct; (a) direct (person); (to speak) directly 
(to have) a discussion (discussions); to discuss 
(to be) clue 

(to get an) education: to educate 
to erase 

(to make) an error (errors) 
evaluation(s); to evaluate 

(to give, to have, to take) an exam(ination) (exam(ination)s): to examine 
(to give) an example (examples); to exemplify: exemplary 
grade(s); to grade 

to hand in (something), to hand (something) in 

(to distribute) a handout (handouts): to hand out (something); to hand (something) out 

(to do, to complete) homework 

illustration(s); to illustrate 

(to give) instruction(s); to instruct 

(academic, personal) journal(s) 

keyboard(s); to key/type in (data), to key/type (data) in 

knowledge; to know; (to do something) knowingly 

(to give, to attend, to listen to) a lecture (lectures); to lecture 

login(s); to log in 

logout(s); to log out 

mark(s); to mark (a paper) 

(magic, whiteboard) maiker(s) 

to miss (a class, a quiz): (to be) missing 
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(to make) a mistake (mistakes); 

to mistake (one thing or person) for (another thing or person); mistaken; mistakenly 
(to express, to give, to offer) an opinion (opinions) 
(to write, to do, to give, to present) a paper (papers) 
paragraph(s); to paragraph 

(class) pcriod(s); periodic (al) (test); periodically (review) 

plagiarism; to plagiarize 

point (s) (on an exam, in a course) 

practice; to practice 

printout(s); to print out (something), to print (something) out 

(to have, to solve) a problem (problems); problematic 

publication(s); to publish 

(to ask) a question (questions); to question 

(to give, to take) a quiz (quizzes); to qui?, 

to raise (one's) hand 

(to do, to conduct) research: to research 

(to give) a/some response; to respond 

requirement(s); to require 

review(s); to review 

schcdule(s); to schedule (something) on (date) for/at (time) 

semester(s) 

seminar(s) 

sentence(s) 

solution(s); to solve 

study (studies); to study: studious; studiously 

syllabus (syllabi/syllabuses) 

(to take, to have, to give) a test (tests); to test 

text(book)(s) 

theory (theories); to theorize; theoretical; theoretically 

(sub)title(s); to (sub)title 

university (universities) 

whitcboard(s) 

(to do) work: to work 

workshop(s) 

workstation(s) 



3.2 University of Aizu Physical Environment Vocabulary 

administration complex/building 

athletic field 

auditorium 

bus stop 

classroom(s) 

clubroom(s) 

computer assisted instruction rooin(s), CAI room(s) 

computer lab(s) 

conference room 

(quadrangle) court yard(s) 

drinking/water fountain 

energy center 



C 




field house 
gymnasium 

language media laboratory )/lab(oratories)s, LML(s) 
lecture hall 

libraiy lounge(s), research library/ libraries 

martial arts hall 

men's/women's locker room(s) 

(main) library 

monitor(s) 

mouse (mice) 

multimedia center 

ofRce(s) 

parking lot(s) 

printer(s) 

relaxation room (= SLRU 1) 

research lab(s) 

research quadrangles 

men's / women's rest rooni( s ) 

(computer, TV) screen(s) 

stairwell(s) 

student hall 

tennis court(s) 

track 

university bookstore 
university cafeteria 
university park 
university pool 
university restaurant 
university shop 
waterside park 
weight room ( = SLRU 2) 
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3.3 University of Aizu People Vocabulary 



Common Noun Forms 


Proper Noun Forms (Official Titles) 


university president 


President, University of Aizu 


director of the department for student affairs 


Director, Department for Student Affairs 


department head(s) 


Director, Department of Computer Hardware 




Director, Department of Computer Software 


center head(s) 


Director, Center for Language Research 




Director, Center for Mathematical Sciences 




Director, Center for Cultural Research and Studies 


(university) faculty 




= professors, associate professors, assistant professors, and research associates 


professors 




= professors, associate professors, assistant professors 


(full) professor(s) 


Professor(s) 


associate professor(s) 


Associate Professor(s) 


assistant professor(s) 


Assistant Professor(s) 


research associate(s) 


Research Associate(s) 


office staff 




secretary (secretaries) 




food staff 




sales clerks/staff 




maintenance personnel 




security guard(s) 




raduate student(s) 




undergrad(uate) (student )(s) 




senior(s) 




junior(s) 




sophomore(s) 




fresh man /freshmen 





3.4 University of Aizu Computer Science Discourse Vocabulary 

The following lists of noun, gerund, verb, and adjective forms were constructed after an analysis 
of lexical frequency was run on the University of Aizu 1993 Annual Review: School of Computer 
Science and Engineering. Technical items that primarily appear in professional computer science 
discourse (and thus requiring definition in computer science glossaries) are marked with a double 
asterisk (**). Semi-technical items that the computer science community has adopted from 
general English and added special meanings to are marked with an asterisk (*). All other items 
are of a non-technical nature (i.e. general English) hut used often in computer science discourse. 3 
Other parts of speech, such as adverbs, articles, and prepositions, have not been listed since the 
items that appeared under these categories are already known by most incoming freshmen. 

* Vwatnibtry thai lias been listed in other rntt>Rt>rii"i, siu'h ;is the list of t'nivcisity of Aizu AvndvinU' Vocab- 
ulary, liiivr nut been repeat here. 
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3.4.1 Nouns and Gerunds 



eoinputcr(s), system(s). software. design(s), modcl(s), method(s), *processing. achievement (,s). 
summary (summaries), data/datum (data). *algorithm(s), information, recognition, projt;ct(s). 
development's), *modeliug, *language(s). time, result(s), performance(s). conferences), appli- 
cation(s). *hardware. *net\vork(s). science(s), sinmlation(s). approach(es), management, society 
(societies), usc(s), eomnmnication. Vomputing, circuit(s), control(s). device(s), "multimedia, 
ca.se(s). * architecture, engineering, graph(s), process(es), specification(s). **VLSI (Very Large 
Scale Integration), * graphics, shape(s), application(s), audio, number(s), work, state(s). tech- 
nology (technologies), image(s), program(s), structnre(s), technique(s). current, elcetron(s), ob- 
jects), physics, sound, speech, group(s). space(s). member(s), environment(s), *interface(s). 
* programming, set(s). surfacc(s). order(s). property (properties), circuit(s), logic, memory, 
*processor(s). support, activity (activities), behavior, function(s), **CAD {■- Computer- Aided 
Design), point(s). r presentation(s). world(s), agent(s), courseware, learning, nuclei, base(s), 
Mat,abase(s). hand(s). machine(s), reality, semantics, synthesis, coinputation(s), feature(s), 
function(s). inipleinentation, mathematics, *protocol(s). *editor(s), fonn(s), goal(s), acticn(s), 
character!*), evaluation(s), verification, visualization, eftVct(s), generation, term(s), tool(s), an- 
imation. area(s). basis (bases), complexity (complexities), concept(s), description(s), search(es). 
*scinigr<-up(s). signal(s). *tree(s) (- diagram), component(s). course(s) (- direction), element(s), 
foundation(s), optimi/ation(s). **seiuiconductor(s). user(s) 

Note: These nouns (the singular form, the plural form, or both forms) or gerunds appeared 20 
or more times in the text. They are listed in descending order of frequency. 

3.4.2 Verb Forms 

be, design, model, has, base, process, use, develop, distribute, test, study, compute, control, 
propose, graph, work, state, *prograin. structure, give, order, show, support, consider, proceed, 
learn, apply, follow, present, include, obtain, transport, form, route, call, investigate, patch, 
search, signal, relate, advance, allow, gain, make, provide, accept, implement, press, report, 
decentralize, speed up, know, need, train, change, cross, define, evaluate, exist, delay, invite, 
suggested, understand, balance, connect, describe, feature, planning, point, review, time, win- 
dow, account for, branch, find, like, open, transfer, free, generate, map. practice, queue, .service, 
specify, teach, view, write, cluster, flow, mean, operate, perform, propti.se, truncate 

Note: These verbs (or various forms of these verbs) appeared 10 or nioii time-, in the text . They 
aie listed in descending order of frequency. 

3.4.3 Adjective Forms 

parallel, new, international, such, mathematical, human, some, other, complex, each, self-timed, 
asynchronous, all, current, more, both, various, any, first, virtual, high, scientific, several, gen- 
eral, computational, many, technical, academic, neural, basic, geometric, important, main, con- 
tinuous, different, educational, structural, efficient, global, spatial, fast, formal, geometrical, 
large. **object-oriented, real, visual, quantum, dynamic, possible, simple, infrared, joint, the- 
oretical, second, disjoint, finite, single, common, far, optimal, particular, physical, abstract, 
better, algebraic, effective, local, multiple, synthetic, artificial, conventional, cultural, exper- 
imental, genetic, handwritten, light, natural, necessary, potential, singular, bipolar. *fuzzy. 
I'TJiplrienl. major, nuclear, adaptive, logical, national, open, standard, topological, concurrent, 




free, good, intelligent , interactive, numerical, arbitrary, autonomous, available, certain, dimen- 
sional, hierarchical, hot (— popular), minimum, next, nonlinear, powerful, same, specific, wide 

Note: These adjectives (or noun forms functioning primarily as adjectives) appeared 10 or more 
times in the text. They are listed in descending order of frequency. 

3.5 Miscellaneous Research Statistics 

lexical base: 4 51,983 items 

approximate portion of lexical base already known 5 by students: 4,109 items (7.9 %) 
lexical range: 8 7.683 

approximate percent of lexical range already known by students: 024 items (8 'X ) 
approximate percent of lexical base to be taught : 11,200 items (21.5. %) 
approximate percent of lexical range to be taught: 040 items (8.3 '/ ) 
approximate number of additional items to be taught: 330 items 
approximate number of total items to be taught: 970 items' 
percentage of 970 items that are technical: 3 (0.3 %) 
percentage of 970 items that are semi-technical: 30 (3 %) 

percentage of 970 items that are non-technical (general English): 937 (96.7%) 

This study not only identified items that are appropriate for overt instruction in freshman 
Composition 1 and 2 (the course primarily responsible for attention to vocabulary, grammar, 
and introductory writing instruction), but this project also confirmed our suspicions that the 
majority of words that students need to know to negotiate their way through computer science 
content material are words common to general English rather than words of a semi- or highly 
technical nature. The following statistics illustrate this point: 

lexical base: 51,983 items 

lexical range: 7,683 different items 

percent of lexical base that is technical: approx. 200-300 (0.4-0.6%) 
percent of lexical range that is technical: approx. 180-250 (2.3-3.3 l X) 
percent of lexical base that is semi-technical: 1550-1650 (3.0-3.2%) 
percent of lexical range that is semi-technical: 900-1200 (11.7-15.6%) 
percent of lexical base that is non-technical: 50,000-50,200 (96.2-96.6'X ) 
percent of lexical range that is non-technical: 6200-6600 (81 .1-86.0'X ) 

4 The lexical base is composed of all common and proper nouns, initials, abbreviations, and acronyms in tin* 
Annual Review. « 

■'Wo assume that all words mutually taught in Japanese junior and senior high schools constitute words already 
known by students. 

The lexical range is the number of different lexical items. 

'Singular forms, plural forms, and changes in the verb form each constitute a scpaiate item. 
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For introductory language instruction at tilt" freshman level, it is obvious that concentration 
on the general English vocabulary most common to academic instruction and computer science 
discourse would prioritize English language instruction in the most logical and efficient way. 



4 Educational Applications 

Though the intent of this report is to identify and report the English vocabulary that is necessary 
for incorporation into freshman English composition instruction, it would be appropriate here 
to suggest briefly how this corpora might best be taught. 

Clearly, vocabulary is learned through association, both contextual and relational. Research 
has demonstrated that human beings "learn" a word by associating the written and/or spoken 
forms with the context in which the word is "experienced" and/or by associating it with other 
items in the learner's personal lexicon (Hayes-Roth, B. k Hayes-Roth, F., 1977; Lehrer, A. 1978; 
Meara. P., 1983, 1984; Carter k McCarthy. 1988; Gass k Selinker, 1994). If we want vocabulary- 
instruction in Composition 1 and 2 to be successful, we must be sensitive to this principle. This 
means that we must use the vocabulary in natural contexts, both in the dissemination of content 
material about academic life and computer science and in course assignments that encourage 
students to employ these words to convey their thoughts to one other in class. We envision a 
combination of problem-solving exercises, student-teacher journaling, essay writing, and elec- 
tronic correspondence that will provide plentiful opportunities for reading and using these words 
in natural communicative contexts. In addition, we plan to experiment with various vocabulary 
ideas presented in one of TESOL's latest publications. New Ways in Tmelwiy Vocabulary (Na- 
tion, 1994) to see which activities are applicable to our situation here at the University of Aizu. 
Also, pre- and post-tests will be employed to chart progress and monitor the success of both 
students and the instructional activities. Then finally, in a year from now, a follow-up report 
will document our findings and suggest revisions to the curriculum for the following year. If 
such a cyclical process can be continued year-after-year, only then can thoroughly informed and 
effective composition instruction be guaranteed. 

Acknowledgement: Professors Thomas Orr, Kiel Christianson, and Christian Goetze 
would like to express a note of thanks to Hideaki Okawara for assisting us in many small details 
of this project. One of our goals at the University of Aizu is to produce university research 
professors, an occupation Mr. Okawara personally seeks. It is our hope that this on-the-job 
training in one small bit of university research and publication will not only begin to build 
his personal vita but also begin to lay the foundation of experience that may prove useful for 
future activity in research and publishing. Hideaki, on this day, your 20th birthday, we wish 
you success in your pursuit of knowledge and expertise' in computer science. 
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